26 research outputs found
iASiS Open Data Graph: Automated Semantic Integration of Disease-Specific Knowledge
In biomedical research, unified access to up-to-date domain-specific
knowledge is crucial, as such knowledge is continuously accumulated in
scientific literature and structured resources. Identifying and extracting
specific information is a challenging task and computational analysis of
knowledge bases can be valuable in this direction. However, for
disease-specific analyses researchers often need to compile their own datasets,
integrating knowledge from different resources, or reuse existing datasets,
that can be out-of-date. In this study, we propose a framework to automatically
retrieve and integrate disease-specific knowledge into an up-to-date semantic
graph, the iASiS Open Data Graph. This disease-specific semantic graph provides
access to knowledge relevant to specific concepts and their individual aspects,
in the form of concept relations and attributes. The proposed approach is
implemented as an open-source framework and applied to three diseases (Lung
Cancer, Dementia, and Duchenne Muscular Dystrophy). Exemplary queries are
presented, investigating the potential of this automatically generated semantic
graph as a basis for retrieval and analysis of disease-specific knowledge.Comment: 6 pages, 2 figures, accepted in IEEE 33rd International Symposium on
Computer Based Medical Systems (CBMS2020
Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision
In this work, we propose a method for the automated refinement of subject
annotations in biomedical literature at the level of concepts. Semantic
indexing and search of biomedical articles in MEDLINE/PubMed are based on
semantic subject annotations with MeSH descriptors that may correspond to
several related but distinct biomedical concepts. Such semantic annotations do
not adhere to the level of detail available in the domain knowledge and may not
be sufficient to fulfil the information needs of experts in the domain. To this
end, we propose a new method that uses weak supervision to train a concept
annotator on the literature available for a particular disease. We test this
method on the MeSH descriptors for two diseases: Alzheimer's Disease and
Duchenne Muscular Dystrophy. The results indicate that concept-occurrence is a
strong heuristic for automated subject annotation refinement and its use as
weak supervision can lead to improved concept-level annotations. The
fine-grained semantic annotations can enable more precise literature retrieval,
sustain the semantic integration of subject annotations with other domain
resources and ease the maintenance of consistent subject annotations, as new
more detailed entries are added in the MeSH thesaurus over time.Comment: 36 pages, 8 figures; Dictionary-based baselines added and conclusions
update
Results of the BioASQ tasks of the Question Answering Lab at CLEF 2015
International audienceThe goal of the BioASQ challenge is to push research towards highly precise biomedical information access systems. We aim to promote systems and approaches that are able to deal with the whole diversity of the Web, especially for, but not restricted to, the context of bio-medicine. The third challenge consisted of two tasks: semantic indexing and question answering.59 systems by 18 different teams participated in the semantic indexing task (Task 3a).The question answering task was further subdivided into two phases. 24 systems from 9 different teams participates in the annotation phase (Task 3b-phase A), while 26 systems of 10 different teams participated in the answer generation phase (Task 3b-phase B).Overall, the best systems were able to outperform the strong baselines provided by the organizers.In this paper, we present the data used during the challenge as well as the technologies which were used by the participants
Large-scale fine-grained semantic indexing of biomedical literature based on weakly-supervised deep learning
Semantic indexing of biomedical literature is usually done at the level of
MeSH descriptors, representing topics of interest for the biomedical community.
Several related but distinct biomedical concepts are often grouped together in
a single coarse-grained descriptor and are treated as a single topic for
semantic indexing. This study proposes a new method for the automated
refinement of subject annotations at the level of concepts, investigating deep
learning approaches. Lacking labelled data for this task, our method relies on
weak supervision based on concept occurrence in the abstract of an article. The
proposed approach is evaluated on an extended large-scale retrospective
scenario, taking advantage of concepts that eventually become MeSH descriptors,
for which annotations become available in MEDLINE/PubMed. The results suggest
that concept occurrence is a strong heuristic for automated subject annotation
refinement and can be further enhanced when combined with dictionary-based
heuristics. In addition, such heuristics can be useful as weak supervision for
developing deep learning models that can achieve further improvement in some
cases.Comment: 48 pages, 5 figures, 9 tables, 1 algorith
The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey
Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, synonyms, abbreviations, or term mismatch. To address this, semantic search approaches employing predefined concepts with associated synonyms and relations have been used to expand query terms and improve information retrieval. The National Library of Medicine (NLM) plays a significant role in this area, indexing citations in the MEDLINE database with topic descriptors from the Medical Subject Headings (MeSH) thesaurus, enabling advanced semantic search strategies to retrieve relevant citations, despite synonymy, and polysemy of biomedical terms. Over time, advancements in semantic indexing have been made, with Machine Learning facilitating the transition from manual to automatic semantic indexing in the biomedical literature. The paper highlights the journey of this transition, starting with manual semantic indexing and the initial efforts toward automatic indexing. The BioASQ challenge has served as a catalyst in revolutionizing the domain of semantic indexing, further pushing the boundaries of efficient knowledge retrieval in the biomedical field
Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
This is an overview of the eleventh edition of the BioASQ challenge in the
context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ
is a series of international challenges promoting advances in large-scale
biomedical semantic indexing and question answering. This year, BioASQ
consisted of new editions of the two established tasks b and Synergy, and a new
task (MedProcNER) on semantic annotation of clinical content in Spanish with
medical procedures, which have a critical role in medical practice. In this
edition of BioASQ, 28 competing teams submitted the results of more than 150
distinct systems in total for the three different shared tasks of the
challenge. Similarly to previous editions, most of the participating systems
achieved competitive performance, suggesting the continuous advancement of the
state-of-the-art in the field.Comment: 24 pages, 12 tables, 3 figures. CLEF2023. arXiv admin note: text
overlap with arXiv:2210.0685
Overview of BioASQ 2021-MESINESP track. Evaluation of advance hierarchical classification techniques for scientific literature, patents and clinical trials
CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania,There is a pressing need to exploit recent advances in natural language processing technologies, in
particular language models and deep learning approaches, to enable improved retrieval, classification
and ultimately access to information contained in multiple, heterogeneous types of documents. This is
particularly true for the field of biomedicine and clinical research, where medical experts and scientists
need to carry out complex search queries against a variety of document collections, including literature,
patents, clinical trials or other kind of content like EHRs. Indexing documents with structured controlled
vocabularies used for semantic search engines and query expansion purposes is a critical task for enabling
sophisticated user queries and even cross-language retrieval. Due to the complexity of the medical domain
and the use of very large hierarchical indexing terminologies, implementing efficient automatic systems
to aid manual indexing is extremely difficult. This paper provides a summary of the MESINESP task
results on medical semantic indexing in Spanish (BioASQ/ CLEF 2021 Challenge). MESINESP was carried
out in direct collaboration with literature content databases and medical indexing experts using the DeCS
vocabulary, a similar resource as MeSH terms. Seven participating teams used advanced technologies
including extreme multilabel classification and deep language models to solve this challenge which can
be viewed as a multi-label classification problem. MESINESP resources, we have released a Gold Standard
collection of 243,000 documents with a total of 2179 manual annotations divided in train, development
and test subsets covering literature, patents as well as clinical trial summaries, under a cross-genre
training and data labeling scenario. Manual indexing of the evaluation subsets was carried out by three
independent experts using a specially developed indexing interface called ASIT. Additionally, we have
published a collection of large-scale automatic semantic annotations based on NER systems of these
documents with mentions of drugs/medications (170,000), symptoms (137,000), diseases (840,000) and
clinical procedures (415,000). In addition to a summary of the used technologies by the teams, this paperS